AITopics | speedy q-learning

Collaborating Authors

speedy q-learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speedy Q-Learning

Neural Information Processing SystemsMar-15-2024, 08:00:06 GMT

We introduce a new convergent variant of Q-learning, called speedy Q-learning (SQL), to address the problem of slow convergence in the standard form of the Q-learning algorithm.

action-value function, algorithm, q-learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
Europe > France (0.04)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Speedy Q-Learning

Neural Information Processing SystemsApr-6-2023, 13:03:18 GMT

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor \gamma only T O\big(\log(n)/(\epsilon {2}(1-\gamma) {4})\big) steps are required for the SQL algorithm to converge to an \epsilon-optimal action-value function with high probability. This bound has a better dependency on 1/\epsilon and 1/(1-\gamma), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.

algorithm, gamma, speedy q-learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Speedy Q-Learning

Ghavamzadeh, Mohammad, Kappen, Hilbert J., Azar, Mohammad G., Munos, Rémi

Neural Information Processing SystemsFeb-14-2020, 23:57:56 GMT

algorithm, gamma, speedy q-learning

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Generalized Speedy Q-learning

John, Indu, Kamanchi, Chandramouli, Bhatnagar, Shalabh

arXiv.org Artificial IntelligenceNov-1-2019

In this paper, we derive a generalization of the Speedy Q-learning (SQL) algorithm that was proposed in the Reinforcement Learning (RL) literature to handle slow convergence of Watkins' Q-learning. In most RL algorithms such as Q-learning, the Bellman equation and the Bellman operator play an important role. It is possible to generalize the Bellman operator using the technique of successive relaxation. We use the generalized Bellman operator to derive a simple and efficient family of algorithms called Generalized Speedy Q-learning (GSQL-w) and analyze its finite time performance. We show that GSQL-w has an improved finite time performance bound compared to SQL for the case when the relaxation parameter w is greater than 1. This improvement is a consequence of the contraction factor of the generalized Bellman operator being less than that of the standard Bellman operator. Numerical experiments are provided to demonstrate the empirical performance of the GSQL-w algorithm.

algorithm, q-learning, speedy q-learning, (15 more...)

arXiv.org Artificial Intelligence

1911.00397

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Workflow (0.46)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Speedy Q-Learning

Ghavamzadeh, Mohammad, Kappen, Hilbert J., Azar, Mohammad G., Munos, Rémi

Neural Information Processing SystemsDec-31-2011

We introduce a new convergent variant of Q-learning, called speedy Q-learning, to address the problem of slow convergence in the standard form of the Q-learning algorithm. We prove a PAC bound on the performance of SQL, which shows that for an MDP with n state-action pairs and the discount factor \gamma only T=O\big(\log(n)/(\epsilon^{2}(1-\gamma)^{4})\big) steps are required for the SQL algorithm to converge to an \epsilon-optimal action-value function with high probability. This bound has a better dependency on 1/\epsilon and 1/(1-\gamma), and thus, is tighter than the best available result for Q-learning. Our bound is also superior to the existing results for both model-free and model-based instances of batch Q-value iteration that are considered to be more efficient than the incremental methods like Q-learning.

action-value function, algorithm, q-learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
Europe > France (0.04)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback